Decision Support with Belief Functions Theory for 
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Abstract — The seabed characterization from sonar images is 
a very hard tasls because of the produced data and the unlsnown 
environment, even for an human expert. In this work we propose 
an original approach in order to combine binary classifiers 
arising from different liinds of strategies such as one-versus- 
one or one-versus-rest, usually used in the SVM-classification. 
The decision functions coming from these binary classifiers are 
interpreted in terms of belief functions in order to combine 
these functions with one of the numerous operators of the belief 
functions theory. Moreover, this interpretation of the decision 
function allows us to propose a process of decisions by taking 
into account the rejected observations too far removed from the 
learning data, and the imprecise decisions given in unions of 
classes. This new approach is illustrated and evaluated with a 
SVM in order to classify the different kinds of sediment on image 
sonar. 

Keywords: belief functions theory, decision support, SVM, 
sonar image. 

I. Introduction 

Sonar images are obtained from temporal measurements 
made by a lateral, or frontal sonar trailed by the back of a boat. 
Each emitted signal is reflected on the bottom then received on 
the antenna of the sonar with an adjustable delayed intensity. 
Received data are very noisy. There are some interferences 
due to the signal travelling on multiple paths (reflection on the 
bottom or surface), due to speckle, and due to fauna and flora. 
Therefore, sonar images are chraracterized by imprecision 
and uncertainty; thus sonar image classification is a difficult 
problem [1]. Figure [T] shows the differences between the 
interpretation and the certainty of two sonar experts trying 
to differentiate types of sediment (rock, cobbles, sand, ripple, 
silt) or shadow when the information is invisible (each color 
corresponds to a kind of sediment and the associated certainty 
of the expert is expressed in terms of sure, moderately sure 
and not sure) [2]. 

The automatic classification approaches, for sonar images, 
are based on texture analysis and a classifier such as a SVM 
[3]. The support vector machines (SVM) is based on an 
optimization approach in order to separate two classes by 
an hyperplane. For pattern recognition with several classes, 
this optimization approach is possible (see [4]) but time con- 
suming. Hence a preferable solution is to combine the binary 
classifiers according to a classical strategy such as one-versus- 
one or one-versus-rest. The combination of these classifiers 
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Fig. 1. Segmentation given by two experts. 

is generally formed with very simple approaches such as a 
voting rule or a maximization of decision function coming 
from the classifiers. However, many combination operators can 
be used, especially in the belief functions framework (cf. [5]). 
Belief functions theory has been already employed in order to 
combine the binary classifier originally from SVM (see [6], 
[7]). The operators in the belief functions theory deal with the 
conflict arising from the binary classifiers. Another interest 
of this theory is that we can obtain a belief degree on the 
unions of classes and not only on exclusive classes. Indeed 
the decisions of the binary classifiers can be difficult to take 
when data overlap. From the decision function, we can define 
probabilities in order to combine them (cf. [8]). However, a 
probability measure is an additive measure and so probabilities 
cannot easily provide a decision on unions of classes unlike 
belief functions. 

Hence, once the binary classifiers have been combined, we 
propose belief functions in order to take the decision for one 



class only if this class is credible enough, for the union of 
two or more classes otherwise. Moreover, according to the 
application it could be interesting to not take the decision on 
one of the learning classes, and reject data too far from the 
learning classes. Many classical approaches are possible in 
pattern recognition for outliers rejection (see [9], [10]). We 
propose here to integrate outliers rejection in our decision 
process based on belief functions. 

In addition to this new decision process, the originality of 
the paper concerns the modelization that we propose, i.e. how 
to define the belief functions, on the basis of decision functions 
coming from the binary classifiers. 

This paper is organized as follows: in section we recall 
the principle of the support vector machines for classification. 
Next, we present the belief functions theory in section in 
order to propose in section |IV] our belief approach to com- 
bine the binary classifiers and to provide a decision process 
allowing the outliers rejection and the indecision expressed as 
possible decisions on unions. This approach is evaluated for 
seabed characterization on sonar images in section [V] 

II. SVM FOR CLASSIFICATION 

Support vector machines were introduced by [11] based on 
the statistical learning theory. Hence, SVM can be used for 
estimation, regression or pattern recognition like in this paper 

A. Principle of the SVM 

The support vector machine approach is a binary classi- 
fication method. It classifies positive and negative patterns 
by searching the optimal hyperplane that separates the two 
classes, while guaranteeing a maximum distance between the 
nearest positive and negative patterns. The hyperplane that 
maximizes this distance called margin is determined by par- 
ticular patterns called support vectors situated at the bounds of 
the margin. These only few support vector numbers are used to 
classify a new pattern, which makes SVM very fast. The power 
of SVM is also due to their simplicity of implementation and 
to soHd theoretical bases. 

If the patterns are linearly separable, we search the hyper- 
plane y = w.x + b which maximizes the margin between the 
two classes where w.x is the dot product of w and x, and b 
the bias. Thus w is the solution of the convex optimization 
problem: 

Min ||w||V2 (1) 

subject to: 

ytiw.xt+b) -1>0 yt = l,...,l, (2) 

where xt £ IR'^ stands for one of the / learning data, and 
yt G {—1,+!} the associated class. We can solve this 
optimization problem with the following Lagrangian: 

^^M£^Aayt(w;.xt-5)-l), (3) 

t=i 



where the Af > are the Lagrange multipliers, satisfying 

I 

t=i 

If the data are not linearly separable, the constraints (|2]i are 
relaxed with the introduction of positive terms In this case 
we search to minimize: 

with the constraints given for all t: 

( ytiw.xt +6) > 1-6 
I 6 > 

where C is a constant given by the user in order to weight the 
error. This problem is solved in the same way as the linear 
separable case with Lagrange multipliers < At < C. 

To classify a new pattern x we simply need to study the 
sign of the decision function given by: 

/(a;) = ^ ytAtXt.x-b, (6) 
tesv 

where SV = {t ; At > 0} for the separable case and 
SV = {i ; < Af < C} for the non-separable case, is the 
set of indices of the support vectors, and At are the Lagrange 
multipliers. 

In the nonlinear cases, the common idea of the kernel 
approaches is to map the data in a high dimension. To do 
that we use a kernel function that must be bilinear, symmetric 
and positive and corresponds to a dot product in the new space. 
The classification of a new pattern x is given by the sign of 
the decision function: 

f{x)^ ytKtK{x,Xt)-b (7) 
tesy 

where K is the kernel function. The most used kernels are the 
polynomial K[x,Xt) = {x.xt + 1)'', (5 G IN, and the radial 
basis functions K{x,Xt) ~ e t"!!^^^'!! , j g R+. The choice 
of the kernel is not always easy and generally left to the user. 

B. Multi-class classification with SVM 

We can distinguish two kinds of approaches in order to 
use SVM for classification with n classes, n > 2. The first 
one consists in fusing several binary classifiers given by the 
SVM - the obtained results by each classifier are combined to 
produce a final result following strategies such as one-versus- 
one or one-versus-rest. The second one consists in considering 
the optimization problem. 

• Direct approach: in [4], the notion of margin is extended 
to the multi-class problem. However, this approach be- 
comes very time consuming, especially in the nonlinear 
case. 

• one-versus-rest: This approach consists in learning n de- 
cision functions fi, i — 1, ...,n given by the equations ^ 
or (|2l) according to the cases, allowing the discrimination 
of each class from the n ~ 1 others. The affection of 
a class Wk to a new pattern x is generally given by 



the relation: k — argmax /i(a;). In the nonhnear case, 

i— l,...,n 

we have to be careful of the parameters of the kernel 
functions that could have some different orders following 
the learning binary classifiers. So, it could be better 
to decide on normalized functions calculated from the 
decision functions (see [12], [13]). 

• one-versus-one: Instead of learning n decision functions, 
we try here to discriminate each class from each other. 
Hence we have to learn n{n — l)/2 decision functions, 
still given by equations I© or dTji according to the 
different cases. Each decision function is considered as a 
vote in order to classify a new pattern x. The class of x 
is given by the majority voting rule. 

Some other methods have been proposed based on previous 
ones: 

• Error-Correcting Output Codes (ECOC): let Wi, 
i — 1, n, be the classes, Sj, j = 1, s, the different 
classifiers (s = n in the case one-versus-rest and 
s — n{n — l)/2 in the case one-versus-one), {Mij), 
the matrix of the codes with the classes in row and the 
classifiers in column, stands for the contribution of each 
classifier to the final result of the classification (based 
on the error of all the classifiers). The final decision is 
given comparing the results of the classifiers with each 
row of the matrix; the class of a new pattern x is the 
class giving the least error (see [14]). 

• According to the decision functions, [8] defined a prob- 
ability ( fT9] l in order to normalize the decision func- 
tions. Hence, we can combine the binary classifiers (for 
both one-versus-rest and one-versus-one cases) with a 
Bayesian rule (see [15]) or with more simple rules (see 
[7]). 

. DAGSVM (Directed Acyclic Graph SVM) proposed by 
[16]: In this approach, the learning is made as the one- 
versus-one with the learning of 1)/2 binary decision 
functions. In order to generalize, a binary decision tree is 
considered where each node stands for a binary classifier 
and each leaf stands for a class. Each binary classifier 
eliminates a class and the class of a new pattern is the 
class given by the last node. 

III. Belief functions theory 

The belief functions theory, also called evidence theory or 
Dempster-Shafer theory (see [17], [18]) is more and more 
employed in order to take into account the uncertainties 
and imprecisions in pattern recognition. The belief functions 
framework is based on the use of functions defined on 
the power set 2® (the set of all the subsets of 8), where 
Q = {wi,...,w„} is the set of exclusive and exhaustive 
classes. These belief functions or basic belief assignments, rrij 
are defined by the mapping of the power set 2® onto [0, 1] with 
generally: 

m,(0) = O, (8) 



and 

Y.m,{X)^l, (9) 

where TOj(.) is the basic belief assignments for an expert (or a 
binary classifier) Sj, j = 1, s. Thus in the one-versus-rest 
case s = n and in the one-versus-one case s = n{n — l)/2. 

The equation ^ makes the assumption of a closed world 
(that means that all the classes are exhaustive) [18]. We can 
define the beUef functions only with: 

mj(0)>O, (10) 

and the world is open {cf. [19]). But in order to change an 
open world to a closed world, we can add one element in the 
discriminating space and this element can be considered as the 
garbage class. The difficulty, as we will see later, is the mass 
that we have to allocate to this element. 

We have two advantages with the belief functions theory 
compared to the probabilities and Bayesian approaches. The 
first one is the possibility for one expert {i.e. a binary classifier) 
to decide that a new pattern belongs to the union of some 
classes without needing to decide an unique class. The basic 
belief functions are not additive that gives more freedom for 
the modelization of some problems. The second one is the 
modelization of some problems without any a priori by giving 
the mass of belief on the ignorances {i.e. the unions of classes). 

These simple conditions in equations ([8]l and (|9]l, give 
a large panel of definitions of the belief functions, which 
is one of the difficulties of the theory. From these basic 
belief assignments, other belief functions can be defined such 
as credibility and plausibility. The credibility represents the 
intensity that the information given by one expert supports an 
element of 2®, this is a minimal behef function given for all 
X e 2® by: 

bel(X)= ^ m,(r). (11) 

The plausibility represents the intensity with which there is no 
doubt on one element. This function is given for all X G 2® 
by: 

pl(X) = ™^(^) 

= bel(e) - bel(X=) ^ ' 

= 1 -TOj(0) -bel(X=), 

where X'^ is the complementary of X in 6. 

To keep a maximum of information, it is preferable to 
combine information given by the basic belief assignments 
into a new basic belief assignment and take the decision on 
one of the obtained belief functions. Many combination rules 
have been proposed. The conjunctive rule proposed by [20] 
allows us to stay in an open world. It is defined for s experts, 
and for X e 2® by: 

5 

Yin...nY,=x j=i 



where Yj e 2^ is the response of the expert j, and nijiYj) 
the corresponding basic belief assignment. 

Initially, [17] and [18] have proposed a conjunctive nor- 
malized rule, in order to stay in a closed world. This rule is 
defined for s classifiers, for all X ^2^, X ^$ by: 



1 



1 - "^Conj (0 
"^Conj {X) 



Yin...nY,=x ]=i (14) 



1 



"IConj 1 



where Yj G 2® is the response of the expert j, and mj{Yj) the 
corresponding basic belief assignment. r7iconj(0) is generally 
interpreted as a conflict measure or more exactly as the 
inconsistence of the fusion - because of the nonidempotence 
of the rule. This rule applied on basic belief assignments 
where the only focal elements are the classes Wj {i.e. some 
probabilities) is equivalent to a Bayesian approach. A short 
review of all the combination rules in the belief functions 
framework and a number of new rules are given in [5]. 

If the credibility function provides a pessimistic decision, 
the plausibility function is often too optimistic. The pignistic 
probability [19] is generally considered as a compromise. It is 
calculated from a basic belief assignment m for all X e 2®, 
with X 7^ by: 

where |^| is the cardinality of X. 

In this paper, we wish to reject part of the data that we 
do not consider in the learning classes. Hence a pessimistic 
decision as to the maximum of the credibility function is 
preferable. Another criterion proposed by [21], consists in 
attributing the class Wk for a new pattern x if: 

J bel(wfe)(a;) 

\ hel{wk){x) > heliwDix). 

The addition of this second condition on the maximum of 
credibility, allows a decision only if it is nonambiguous, i.e. 
if we believe more in the class Wk than in the subset of the 
other classes (the complementary of the class). 

Another approach proposed in [22] considers the plausibility 
functions and gives the possibility to decide whichever element 
of 2® and not only the singletons as previously. Thus the new 
pattern x belongs to the element A of 2® if: 

A = a.Tginax{mb{X){x)pl{X){x)) , (17) 
where mi, is a basic belief assignment given by: 

■mb{X) = KbXx 

r is a parameter in [0, 1] allowing a decision from a simple 
class (r = 1) until the total indecision Q {r = 0). Ax allows 
the integration of the lack of knowledge on one of the elements 
X in 2®. In this paper, we will chose Ax = 1. The constant 
Kb is the normalization factor giving by the condition of the 
equation (|9]l. 



max he\{wi){x), 

l<i<n 



(16) 



(18) 



IV. Belief functions theory for classification 

WITH SUPPORT VECTOR MACHINES 

In the previous sections, we have described the two main 
strategies in order to build a multi-class classifier from binary 
classifiers: the one-versus-rest and one-versus-one approaches. 
Most of the time the formalism to combine the binary classifier 
results is different according to the strategy. [23] have pro- 
posed a combination approach of the binary classifier decisions 
based on the belief functions theory given an unique formalism 
for both one-versus-one and one-versus-rest strategies. The 
basic belief assignments are defined from confusion matrices 
of the binary classifiers. Working directly on the classifier 
decisions allows a loss of information contained first in the 
decision functions. Thus it could be better to define the basic 
belief assignments from the decision functions rather than 
from the confusion matrices (i.e. form the classifier decisions). 

However, the decision functions are not normalized, so 
we can have problems in the combination of this function 
especially with the one-versus-rest strategy. [8] has defined a 
probability from these decision functions / such as: 



P{y - 1//) = Y" 



1 



cxp{Af + B) ' 



(19) 



where A and B are calculated in order to get 
P{y ~ 1// = 0) = 0.5. Different approaches have 
been proposed for the estimation of these parameters (see 
[24]). 

[7] uses a one class SVM, introduced by [25]. So the 
combination can be done only with a one-versus-rest strategy. 
The decision functions coming from this particular classifier 
are employed to define some plausibility functions on the 
singleton wc 



pl(wi)(x) 



ft{x) + p 
P 



(20) 



where jiix) is the decision function giving the distance be- 
tween X and the fronter of class Wi and p is a factor estimated 
in the one-SVM algorithm that depends on the kernel (c/ [25]). 

The first originality of this paper resides in the definition 
of the basic belief assignments that we obtain directly from 
the decision functions / given by the equations Q or (|2|. 
The basic idea consists in considering the data dispersion in 
one of the semi-spaces given by the hyperplane, following an 
exponential distribution. This distribution gives a dispersion of 
the data around the mean more or less near to the hyperplane, 
with the opportunity to observe data very far away from 
the hyperplane. Doing this we keep the basic idea of the 
SVM. Hence, according to the sign of the decision function 
(i.e. the semi-space defined by the hyperplane), the belief 
can be obtained by the cumulative density function of the 
exponential distribution (see figure |2]). We define the basic 



belief assignment by: 

■mi{w^){x) =aj (^{1 - i exp(-ji-/i(x)))]l[o,+oo[(/i(a;)) 
exp ( - (x) ) 1] _ oo,o[ (/i (a;) ) ) 

' mt{w^){x) (exp(-ji-/j(a;))]l[o^+oo[(/i(a;)) 

(1 _ iexp(-^/,(x)))ll]_^,o((/.(a;))) 

^ mi{Q){x) =1 - a; 

where a; is a discounting factor of the basic beUef assignment, 
Ai p and Ai.„ are some parameters depending on the decision 
functions of class Wi that we define in equation (l2Tl i. The ratio 
i is introduced to increase the belief to the class related to the 
semi-space where the data are located (see figure |2]l. There are 
many ways to choose or to calculate the discounting factor that 
is generally close to one. [26] proposes a method to obtain the 
discounting factor that optimizes the decision taking advantage 
of the pignistic probability. We propose here to calculate this 
discounting factor according to the good classification rate of 
binary classifiers. The good classification rates are calculated 
with the study of the sign of the decision function /; on the 
learning data used to determine the model of binary classifiers. 




Fig. 2. Illustration of the basic belief assignment based on the cumulative 
density function of the exponential distribution. 

We propose to estimate the Ai parameters from the mean 
of the decision functions on the learning data in order to be 
coherent with the exponential distribution. Hence Ai p and Ai.„ 
are given by: 

1 ' 

*=i (21) 

1 X ^ 

Ai,„ = y^/,;(x)]l]_oo,0[(/j(a;)). 

t=l 

This proposed basic belief assignment model allows a good 
modelization of the information given by the binary classifiers 
in order to combine them by both one-versus-rest and one- 
versus-one strategies. Thus for a one-versus-rest strategy, 
wf represents the union of the other classes than Wi, i.e. 
\ {wi}. In the one-versus-one case, the decision functions 



fi, i — 1, ■■■,n{n — l)/2 can be rewritten as fij with i < j 
and i,j — 1, ...,n, where i and j correspond to the considered 
classes Wi and Wj. In this one-versus-one case, must be 
seen as Wj and the basic beUef assignment are given by: 

m^j{w^){x) =ajj((l -exp(-^/y(a;)))]l[o,+oo[(/»j(a;)) 
+ cxp ( - j-i^ (x) ) 1] _ ,o[ (/u (a;) )) 

' mij{wj){x) =a,j(exp(-j-i-/y(x)))l[o,+oo[(/ij(2;)) 

(l-exp(-^/,,(a;)))]l]_^,o[(/.,(x))) 

^ TOy (e)(a;) =1 - Qfy 
with 

1 ' 

*=i (22) 

1 

t=l 

We use here the conjunctive normalized rule (equation (fT4]l). 
Thus we can apply this rule in order to combine the n 
basic belief assignments in the one-versus-rest case and the 
n{n — l)/2 basic belief assignments in the one-versus-one 
case. When the data overlap a lot, more complicated rules 
such as proposed in [5] could be preferred. 

For the decision step, we want to keep the possibility to 
take the decision on a union of classes (i.e. when we can not 
decide between two particular classes) and also to not take a 
decision when our belief in one focal element is too weak. 
Thus we propose the following decision rule in two steps: 

1 ) The decision rule of the maximum of the credibility with 
reject defined by the equation ( fT6b is applied in order to 
determine the patterns that do not belong to the learning 
classes. 

2) The decision rule given by the equation ([TtI i is next 
applied to the non-rejected patterns. 

Another possible decision process could be first the appli- 
cation of the decision rule given by the equation ( [TtI i. and 
next the decision rule of the maximum of the credibility with 
reject on the imprecise patterns that first belong to the unions 
of classes. On the illustrated data given in the next section, 
we obtain similar results. We call this decision process (2-1) 
and the previous one (1-2). 

V. Application 

A. Sonar data 

Our database contains 42 sonar images provided by the 
GESMA (Groupe d'Etudes Sous-Marines de I'Atlantique). 
These images were obtained with a Klein 5400 lateral sonar 
with a resolution of 20 to 30 cm in azimuth and 3 cm in range. 
The sea-bottom depth was between 15 m and 40 m. 

Some experts have manually segmented these images giving 
the kind of sediment (rock, cobble, sand, silt, ripple (vertical or 
at 45 degrees)), shadow or other (typically shipwrecks) parts 



on images. It is very difficult to discriminate the rock and the 
the cobble and also the sand and silt. However, it is important 
for the sedimentologists to discriminate the sand and the silt. 
The type "ripple" can be some ripple of sand or ripple of 
silt. Hence, with the point of view of the sedimentologists we 
consider only the three classes of sediment; Ci=rock-cobble, 
C2=sand and C3=silt. And in order to evaluate our decision 
process, we take the ripple as the fourth class (C4) that is 
unlearned. 

Each image is cut off in tiles of size 32x32 pixels (about 
6.5 meter by 6.5 meter). With these tiles, we keep 3500 tiles of 
each class with only one kind of sediment in the tile. Hence, 
our database is made of 4x3500 tiles. We consider 2/3 of them 
for the learning step (only for the three classes of sediment) 
and 1/3 of them for the test step (i.e. 1167 tiles for each kind 
of sediment). 

In order to classify the tiles of size 32x32 pixels, we 
first have to extract texture parameters from each tile. Here, 
we choose the co-occurrence matrices approach [1]. The co- 
occurrence matrices are calculated by numbering the occur- 
rences of identical gray level of two pixels. Six parameters 
given by Haralick are calculated: homogeneity, contrast esti- 
mation, entropy estimation, the correlation, the directivity, and 
the uniformity. Concerning these six parameters, we calculate 
their mean on four directions: 0, 45, 90 and 135 degrees. 
The problem for co-occurrence matrices is the non-invariance 
in translation. Typically, this problem can appear in a ripple 
texture characterization. More features extraction approaches 
can be used such as the run-lengths matrix, the wavelet 
transform and the Gabor filters [1]. 

We use the libSVM [27], and after comparing several ker- 
nels, we have retained the radial basis function (with 7=1/6 
where 6 is the dimension of the data) and we take weighting 
of the error C = 1 because of the data overlap. 

B. Results 

The table U shows the results for the SVM classifier with 
the strategies one-versus-one and one-versus-rest. We note that 
there are many errors between the sand (C2) and silt (C3), 
that are two homogeneous sediments. The ripple (C4), the 
unlearning class, is more heterogeneous than the sand and 
silt, this why it is more classified as rock (Ci). The table |ll] 





one-vs-one 


one-vs-rest 


% 


Ci 


C2 


C3 


Ci 


C2 


Ca 


Ci 


91.00 


8.83 


0.17 


84.40 


15.08 


0.51 


C2 


7.11 


80.72 


12.17 


2.57 


61.27 


36.16 


C3 


2.06 


30.42 


67.52 


0.86 


22.71 


76.44 




65.13 


33.16 


1.71 


52.36 


45.41 


2.21 



TABLE I 

Results of the SVM classifier for the both strategies 

ONE-VERSUS-ONE AND ONE-VERSUS-REST. 



shows the same results, but with the proposed approach based 
on the belief function theory (presented in section HIH i with 
the decision based on the pignistic probability. This approach 



provides some similar results than the basic versions of the 
SVM (table m. Note that the strategy one-versus-rest provides 
more errors between the sand and silt. This can be explained 
because the data overlap. In the rest of the paper we consider 
only the one-versus-one strategy. 





one-vs-one 


one-vs-rest 


% 


Ci 


C2 


Ca 


Ci 


Ca 


Ca 


Ci 


88.00 


11.91 


0.09 


91.51 


6.18 


2.31 


C2 


4.80 


83.29 


11.91 


8.83 


20.90 


70.27 


Ca 


1.20 


32.05 


66.75 


2.06 


5.23 


92.71 


C4 


56.38 


41.99 


1.63 


67.52 


20.57 


11.91 



TABLE II 

Results of the SVM classifier with belief function theory for 

THE BOTH STRATEGIES ONE-VERSUS-ONE AND ONE-VERSUS-REST. 



The table |III] gives the results with possible decision on 
unions with r = 0.6. We can see that this kind of cautious 
decision provides less hard errors (i.e. say one kind of sedi- 
ment instead of another). Of course these results depend on 
the values of r that provide a more or less cautious decision as 
we can see on figure [3] If we add the possibility of rejection 
to these results (table HVji, we can see that the most of rejected 
tiles come from the ripple (the unknown class C4). For a given 
class, the rejected tiles come as a majority from the unions 
(imprecise data). Of course this rejection does not depend on 
the r value if we begin by the rejection in our decision process 
(1-2) (presented in section Ullli. 

Figure [3] shows the results of the classification of class 
of ripple (C4) according to the value of r without possible 
rejection. Of course when the value of r is weak the data of 
the three learning classes are classified on the unions. We can 
distinguished three kinds of work intervals on these data: 

• r G [G;0.3]: The classifier is too undecided, 

• r G [0.4; 0.6]: the ambiguity between the classes is 
correctly considered, 

• r G [0.7; 1]: the decision is too hard. 




C1UC2 C1UC3 C2UC3C1UC2UC3 



Fig. 3. Classification of the class of ripple (C4) with possible decision on 
union according to r. 

According to the application, if we want to privilege the 
hard decision at the expense of the rejection, we can try to 
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Results with a belief combination with possible decision on unions. 
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TABLE IV 



Results with a belief combination with possible decision on unions and on the reiected class. 



decide first, possibly on the unions and next try to reject only 
on the unions. In this case we can choose a higher value of 
r. For example with r = 0.8, we propose a comparison of 
the decision processes (1-2) with (2-1) given in the table FVl 
Of course the decision process (2-1) rejects less data, but it is 
only with the rock (class Ci) that we win, and we reject less 
ripple (class unknown C4). Hence, it seems that the decision 
process (1-2) is better for this application. 

Now let's consider the tiles containing more than two 
kinds of sediment. We still learn the SVM classifier with 
the same parameter and the one-versus-one strategy on the 
homogeneous tiles of the three classes rock (Ci), sand (C2) 
and silt (C3) as previously. For the tests, we only take 299 
tiles with the classes: 5i=tiles with rock and sand, S'2=sand 
and silt, 53=silt and ripple and S'4=sand and ripple. 

Table IVTl presents the obtained results of the SVM classifier 
with the classical voting combination and a belief combina- 
tion with pignistic decision and with credibility with reject 
decision. For the two classes Si and S2, the tiles contain only 
learning sediment (rock and sand for and sand and silt for 
S'2). For 5*1 and ^2 the classifiers without reject classify these 
tiles more in sand. The rejection decreases the errors, but for 
52 the rejection is essentially on the sand. The two classes ^3 
and Si contain ripple, the unknown class. Here also, we note 
a confusion with the rock sediment that is an heterogeneous 
texture like the ripple. The rejection for these two classes 
works well, because a large part of the tiles classified in rock 
are rejected and for 5*3 a large part of tiles classified in sand 
are also rejected. 

Table IVIII shows the results with possible decision on the 
union with r = 0.6, with and without possible rejection. The 
addition of the possible decision on the union reduces the 
errors. The rejection is essentially on the tiles classified on 
the unions, except for S2 (sand and silt) a lot of classified- 
sand tiles are rejected, maybe because of the learning step. 

Hence, for the tiles containing more than one kind of 
sediments our decision support could help the human experts. 
Of course, in this case, the evaluation is really difficult. In [2] 
we have propose confusion matrices taking into account the 



proportion of each sediment in a tile. 

VI. Conclusions 

We have proposed an original approach based on the belief 
functions theory for the combination of binary classifiers 
coming from the SVM with one-versus-one or one-versus-rest 
strategies. The modelization of the basic belief assignments 
is proposed directly from the decision functions given by the 
SVM. These basic belief assignments allow to take correctly 
into account the principle of the binary classification with 
SVM by comparison with an hyperplane in linear or nonlinear 
cases. 

The belief functions theory provides a decision support 
without necessary deciding an exclusive class. The decision 
process that we have proposed with possible outliers rejection 
and with possible decision on the union of classes, is very 
interesting because it works like the intuitive classification 
that a human could perform based on the position of support 
vectors and considering the ambiguity of the classes. This deci- 
sion support can really help experts for seabed characterization 
from sonar images. We have seen with the point of view of the 
sedimentologists that if we only consider the different kinds 
of sediments (rock, sand and silt), the ambiguity between the 
sand and the silt is well recognize and the ripple can be partly 
rejected. 
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TABLE VI 

Results of the SVM classifier with the classical voting 
combination and a belief combination with pignistic decision 

AND WITH credibility WITH REIECT DECISION. 
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TABLE V 



Results with a belief combination with possible decision on unions and on the rejected class (1-2) and with rejection on the 

union only (2-1). 
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Results with a belief combination with possible decision on unions with and without possible rejection. 
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